Sage Advice

Using Remote Sensing Imagery to Determine Vegetation Structure in Sagebrush Steppe Habitats

Authors:

Sarah Jaffe, CU Boulder Environmental Studies
Kelsey Beckrich, CU Boulder Earth Data Analytics
In Partnership with NatureServe and the Bureau of Land Management Sunset on Seedskadee National Wildlife Refuge
Figure: Sunset on Seedskadee National Wildlife Refuge, WY
Image Source: USFWS Mountain-Prairie, Tom Koerner
Easily overlooked, sagebrush ecosystems are an integral part of the Intermountain West. Before massive habitat fragmentation, sagebrush ecosystems were estimated to cover 150 million acres in the Western United States. Now, roughly half of that habitat remains, and of the remaining habitat, only 10 percent is considered representative of original, untouched conditions. This means that the vast majority of sagebrush ecosystems are at a tipping point - pushed in one direction by conservation efforts and in the other by land use - oil and gas extraction, agricultural practices, commercial development - as well invasive cheatgrass which can dramatically change landscapes by prolonging fire seasons. Idaho wildfire
Figure: Idaho Fire Burns Through Sagebrush Habitat Fragmented by Cheatgrass.
Image Source: Mountain-Prairie, Michael Pellant/BLM
So why do we care? Obviously many ecosystems are in a state of flux, and while cheatgrass plays a role in increasing the frequency and intensity of wildfires, it is certainly not the only causal factor - or even the most significant. The reality is that, nondescript as they look, sagebrush ecosystems support hundreds of species of flora and fauna . When we lose sagebrush, we lose the species that depend on sagebrush ecosystems and many of these species are significant. Eagles, swift foxes, sage grouse, prairie dogs, black footed ferrets, and countless other plants and animals live in sagebrush ecosystems. Many of these species names are recognizable because at one point or another they have been (or should be) listed as threatened or endangered. The most pertinent of these, being used as an umbrellas species, is the Greater Sage Grouse, an obligate sagebrush ecosystem inhabitant. Greater Sage-Grouse (Centrocercus urophasianus) DSC_0145
Figure: Male Greater Sage-Grouse (Centrocercus urophasianus)
Image Source: Dan Dzurisin
Video: Sage Grouse Lekking Site
Image Source: USFWS Mountain-Prairie, K. Theule

Can Remote Imagery Accurately Model Sagebrush Vegetation?

This brings us to the crux of the issue. Even just looking at the current extent of Sage Grouse habitat, it is clear that even in reduced numbers, sagebrush still covers a highly dispersed area in the Intermountain West. Developing effective conservation practices requires monitoring more land than is practical through traditional means. A 2015 study by Doherty et al., using 30$m^{2}$ and 1$km^{2}$ resolution remote imagery, found that the accuracy of these images to model habitat was highly dependent on scale due to localized variation in sage grouse habitats. We will be exploring whether higher spatial resolution remote imagery can accurately model vegetation structure and composition in these areas, and if so, which scales are most informative.

Current and Historical Sage Grouse Habitat Distribution

Figure: The current and historical habitat distributions of the Greater Sage Grouse
Image Source: Aldridge et al. 2008

Methods

Site Selection

For our project, we selected two NEON sites representative of the sagebrush ecosystem with a range of openly available remote sensing data. We chose the National Ecological Observatory Network's (NEON's) Central Plains Experimental Range (CPER) in Weld County, Colorado, and the Onaqui field site (ONAQ) in the Onaqui Mountains of Nevada. They contain a large amount of sagebrush habitat. See section 2.4 for site details.

NEON Field Sites

Figure: All NEON field sites. CPER in and ONAQ are core terrestrial sites representative of the Central Plains and Great Basin regions, respectively, seen in green.
Image Source: NEON Field Sites

Python Package Imports

This notebook uses a variety of python packages to ensure reproducibility across operating systems, collect data, structure and visualize data, perform calculations, and map sites.

In [1]:
# Import packages
import os
import matplotlib.pyplot as plt
import seaborn as sns
import requests
import urllib
import folium
import numpy as np
import pandas as pd
from pandas.io.json import json_normalize
import geopandas as gpd
from shapely.geometry import shape
from shapely.geometry import Polygon
from shapely.geometry import box
from shapely.geometry import Point
import rasterio as rio
from rasterio.mask import mask
from rasterio.plot import plotting_extent
import rasterstats as rs
import earthpy as et
import earthpy.plot as ep

# Set working directory
os.chdir(os.path.join(et.io.HOME, 'earth-analytics'))

Functions

We used functions for all of our preprocessing work. We accessed NEON's insitu and Canopy Height Model (CHM) datasets via API calls, created shapefiles from dataframes, cross-referenced CHM and insitu coordinates, and called only raster CHM tiles we needed for vegetation structure comparison using custom functions. Finally, we compared the accuracy of stem heights between insitu data plots (40$m^{2}$) and Canopy Height Models (CHM, 1$km^{2}$, with 1$m$ resolution). In order to compare the accuracy of insitu collected vegetation structure data with high resolution CHM data, we plotted the maximum and median of both sets of data (although, we also calculated the minimum and mean).

In [2]:
def open_ecosystem_structure(site, date):
    '''Uses API call to retrieve NEON ecosystem structure (CHM)
    data at a given site and date. Returns  list of all rasters
    within data product. For more information on NEON ecosystem 
    structure data and a full list of available dates see
    https://data.neonscience.org/data-products/DP3.30015.001
    
    Parameters
    ----------
    site : str
        4 Letter site name. See 
        https://www.neonscience.org/field-sites/field-sites-map/list
        for a full list of NEON sites
        
    date : str
        Date of data collection in yyyy-mm format
    
    Returns
    -------
    CHM_raster_tiles : .tif
        All raster .tif tiles associated with
        the site and date specified   
    '''
    data_product_url = ['https://data.neonscience.org/api/v0/data/DP3.30015.001/'
                        + site+'/'+date]
    call_response = requests.get(data_product_url[0])
    call_response.json()
    
    CHM_raster_tiles = []
    
    for i in call_response.json()['data']['files']:
        data_file_url = i['url']
        file_format = data_file_url.find('.tif')
        if not file_format == -1:
            CHM_raster_tiles.append(data_file_url)
    
    return CHM_raster_tiles


def open_woody_veg_structure(site, date):
    '''Uses API call to retrieve NEON product data for woody 
    vegetation structure. Returns pandas of merged apparent 
    individual, mapping and tagging, and per plot per year
    documents, eg one dataframe with locational, species, 
    and height data. Also returns a pandas dataframe of filtered 
    plot data to facilitate geospatial merges and calculation of
    raster stats. For more information on NEON woody vegetation 
    structure data products and available dates, see
    https://data.neonscience.org/data-products/DP1.10098.001
    
    Parameters
    ----------
    site : str
        4 Letter site name. See 
        https://www.neonscience.org/field-sites/field-sites-map/list
        for a full list of NEON sites
    
    date : str
        Date of data collection in yyyy-mm format
    
    Returns
    -------
    all_merged_df : pandas.core.frame.DataFrame
        Pandas dataframe of merged measurement, plot, and mapping
        tabular files from data product
    
    plot_df : pandas.core.frame.DataFrame
        Pandas dataframe of perplotperyear.csv locational data
    '''
    data_product_url = ['https://data.neonscience.org/api/v0/data/DP1.10098.001/'
                        + site+'/'+date]
    call_response = requests.get(data_product_url[0])
    
    all_urls = []
    
    for i in call_response.json()['data']['files']:
        data_file_url = i['url']
        height_find = data_file_url.find('individual')
        plot_find = data_file_url.find('perplot')
        map_find = data_file_url.find('mapping')
        
        if not height_find == -1:
            apparent_df = pd.read_csv(data_file_url)
        elif not plot_find == -1:
            plot_df = pd.read_csv(data_file_url)
        elif not map_find == -1:
            map_df = pd.read_csv(data_file_url)
    
    apparent_df = apparent_df[[
        'plotID', 'individualID', 'height']]
    
    plot_df = plot_df[['plotID', 'plotType',
                       'decimalLatitude', 'decimalLongitude',
                       'easting', 'northing']]
    
    map_df = map_df[['plotID', 'individualID', 'scientificName']]
    
    measurement_map_merge = pd.merge(
        apparent_df, map_df, on=['plotID', 'individualID'])
    all_merged_df = pd.merge(plot_df, measurement_map_merge, on='plotID')
    
    return all_merged_df, plot_df


def NEON_site_extent(path_to_NEON_boundaries, site):
    '''Extracts a NEON site extent from an individual site as
    long as the original NEON site extent shape file contains 
    a column named 'siteID'.

    Parameters
    ----------
    path_to_NEON_boundaries : str
        The path to a shape file that contains the list
        of all NEON site extents, also known as field
        sampling boundaries (can be found at NEON and
        ESRI sites)

    site : str
        One siteID contains 4 capital letters, 
        e.g. CPER, HARV, ONAQ or SJER.

    Returns
    -------
    site_boundary : geopandas.geodataframe.GeoDataFrame
        A vector containing a single polygon 
        per the site specified.        
    '''
    NEON_boundaries = gpd.read_file(path_to_NEON_boundaries)
    boundaries_indexed = NEON_boundaries.set_index(['siteID'])

    site_boundary = boundaries_indexed.loc[[site]]
    site_boundary.reset_index(inplace=True)

    return site_boundary


def buffer_point_plots(df, crs, buffer):
    '''Creates geodataframe from plot points
    within a designated coordinate reference system. 
    Buffers plot points to a given radius. Compatible
    with most NEON tabular plot data files including
    northing and easting locational columns. Final product
    can be used to visualize plot locations or combined 
    with other spatial data products.
    
    Parameters
    ----------
    df : pandas.core.frame.DataFrame
        df including Northing and Easting plot locations
    
    crs : str or rasterio.crs.CRS
        String of desired coordinate reference system
    
    buffer : int
        Desired radius for final plot polygons
    
    Returns
    -------
    buffered_gdf : geopandas.geodataframe.GeoDataFrame
       Dataframe with point plots buffered to polgyons
    '''
    buffered_gdf = gpd.GeoDataFrame(
        df, geometry=gpd.points_from_xy(
            x=df.easting, y=df.northing), crs=crs)
    
    buffered_gdf['geometry'] = buffered_gdf.geometry.buffer(buffer)
    
    return(buffered_gdf)


def tiles_over_insitu_plots(tiles, plots):
    '''Takes a list of raster images and geodataframe
    of plot polygons within the same CRS. Cross references
    overlap between raster extent polygon and plot point 
    polygons. Returns list of .tiff file locations that 
    overlap completely with plot polygons.
    ----------
    tiles : list
        List of rasters
    
    plots : geopandas.geodataframe.GeoDataFrame
        Geodataframe with polygons of AOI plots
    
    Returns
    -------
    target_rasters : list
        List of strings with to raster locations
    '''
    target_rasters = []
    
    insitu_plot_size = plots.loc[0, 'geometry'].area
    
    for tile in tiles:
        with rio.open(tile) as src:
            extent = plotting_extent(src)
        raster_polygon = Polygon([
            [extent[0], extent[2]],
            [extent[0], extent[3]],
            [extent[1], extent[3]],
            [extent[1], extent[2]]])
        raster_polygon_gdf = gpd.GeoDataFrame(crs=src.crs,
                                              geometry=[raster_polygon])
        raster_plot_intersection = gpd.overlay(
            raster_polygon_gdf, plots, how='intersection')
        
        if raster_plot_intersection['geometry'].empty:
            pass
        elif int(
                raster_plot_intersection.loc[0, 'geometry'].area) == int(
                insitu_plot_size):
            target_rasters.append(tile)
    
    return target_rasters


def calculate_rasterstats_dataframe(tiles, plot_polygons):
    '''Creates a geodataframe object with lidar summary statistics 
    using rasterstats zonal. Requires a pandas dataframe with plot
    polygons to cross reference with lidar calculations. Outputs a
    single dataframe with uniquely named summary statistics.
    Parameters
    ----------
    df : pandas.core.frame.DataFrame
        df including Northing and Easting plot locations
    
    crs : str or rasterio.crs.CRS
        String of desired coordinate reference system or 
        rasterio CRS object
    
    Returns
    -------
    CHM_stats : list of geopandas.geodataframe.GeoDataFrame
        Returns a list of geodataframes with lidar max, lidar mean, 
        lidar median and lidar min calculated in new columns.
        calculations. 
    '''
    CHM_stats = []
    
    for tile in tiles:
        with rio.open(tile) as chm_src:
            site_chm = chm_src.read(1, masked=True)
            site_chm_meta = chm_src.meta
        site_tree_heights = rs.zonal_stats(
            plot_polygons,
            site_chm,
            affine=site_chm_meta["transform"],
            geojson_out=True,
            copy_properties=True,
            nodata=0,
            stats=["mean", "median", "max", "min"])
        site_tree_heights_gdf = gpd.GeoDataFrame.from_features(
            site_tree_heights)
    
    rename_dict_lidar = {"mean": "lidar_mean",
                         "median": "lidar_median",
                         "max": "lidar_max",
                         "min": "lidar_min"}

    site_tree_heights_gdf.rename(columns=rename_dict_lidar, inplace=True)
    CHM_stats.append(site_tree_heights_gdf)
    
    return CHM_stats

Data Acquisition

All data for this project is openly available through the National Ecological Observatory Network (NEON). The following Data Products are were used:

  • Ecosystem Structure (Canopy Height Model)
    • NEON'S Ecosystem Structure data includes a LIDAR derived CHM
      • CPER Ecosystem Structure collected May 2017
      • ONAQ Ecosystem Structure collected June 2017
  • Woody Vegetation Structure
    • NEON's Woody Vegetation Structure includes three tabular data files with data on vegetation height, plot location, taxonomic observations, and other general on-the-ground observations.
      • CPER Woody Vegetation Structure collected September 2017
      • ONAQ Woody Vegetation Structure collected September 2017
In [3]:
# Download shapefile of all NEON site boundaries
url = 'https://www.neonscience.org/neon-terrestrial-field-site-boundaries-shapefile'
et.data.get_data(url=url, replace=True)

# Create path to shapefile
terrestrial_sites = os.path.join(
    'data', 'earthpy-downloads',
    'fieldSamplingBoundaries (1)',
    'terrestrialSamplingBoundaries.shp')

# Import insitu plot data for CPER and ONAQ sites
CPER_insitu_df, CPER_plots = open_woody_veg_structure(
    site='CPER', date='2017-09')
ONAQ_insitu_df, ONAQ_plots = open_woody_veg_structure(
    site='ONAQ', date='2017-09')

# Import CHM data and identify crs
CPER_tif_files = open_ecosystem_structure(
    site='CPER', date='2017-05')
with rio.open(CPER_tif_files[0]) as CPER_src:
    CPER_crs = CPER_src.crs

ONAQ_tif_files = open_ecosystem_structure(
    site='ONAQ', date='2017-06')
with rio.open(ONAQ_tif_files[0]) as ONAQ_src:
    ONAQ_crs = ONAQ_src.crs

# Create geodataframes with buffered plot points
CPER_insitu_gdf = buffer_point_plots(
    df=CPER_insitu_df, crs=CPER_crs, buffer=40)
ONAQ_insitu_gdf = buffer_point_plots(
    df=ONAQ_insitu_df, crs=ONAQ_crs, buffer=40)
Downloading from https://www.neonscience.org/neon-terrestrial-field-site-boundaries-shapefile
Extracted output to C:\Users\17205\earth-analytics\data\earthpy-downloads\fieldSamplingBoundaries (1)

Central Plains Experimental Region (CPER)

CPER is a 6,300 hectare NEON site located inside Pawnee National Grasslands in Weld County, Colorado. It is a mixed sagebrush/shortgrass steppe ecosystem that is fragmented by cattle grazing and oil and gas extraction. See below for an interactive map with the CPER site and plot points outlined. Pawnee Buttes Evening Sky

Figure: Mixed shortgrass, sagebrush landscape at Pawnee National Grasslands

Image Source: Michael Kirsh

In [4]:
# Get shapefile of CPER site extent
CPER_site_outline = NEON_site_extent(
    path_to_NEON_boundaries=terrestrial_sites,
    site='CPER')
# Convert plot locations and site outline to JSONs in correct CRS
CPER_plot_locations_json = CPER_insitu_gdf.to_crs(epsg=4326).to_json()
CPER_site_outline_json = CPER_site_outline.to_crs(epsg=4326).to_json()

# Create scalable map with CPER site location and plot points
CPER_map = folium.Map([40.78, -104.7456],
                      zoom_start=10,
                      tiles='Stamen Toner')

CPER_points = folium.features.GeoJson(
    CPER_plot_locations_json, tooltip='CPER Site')
CPER_site = folium.features.GeoJson(CPER_site_outline_json)

CPER_map.add_child(CPER_site).add_child(CPER_points)
CPER_map
Out[4]:

Onaqui (ONAQ)

The Onaqui Mountains, southwest of Salt Lake City, are home to the Onaqui Wild Horses. It is a sagebrush steppe ecosystem that is periodically grazed by livestock and fragmented by cheatgrass. See below for an interactive map with the NEON ONAQ site and plot locations overlayed. Onaqui Wild Horse Gather 02/20/12

Figure: Horses near the Onaqui Mountains

Image source: Bureau of Land Management

In [5]:
# Import shapefile of NEON site
ONAQ_site_outline = NEON_site_extent(
    path_to_NEON_boundaries=terrestrial_sites, site='ONAQ')
# Convert site outline and plot points to JSONs
ONAQ_plot_locations_json = ONAQ_insitu_gdf.to_crs(epsg=4326).to_json()
ONAQ_site_outline_json = ONAQ_site_outline.to_crs(epsg=4326).to_json()

# Create interactive map with ONAQ site outline and plot points
ONAQ_map = folium.Map([40.17759, -112.45244],
                      zoom_start=10,
                      tiles='Stamen Toner')
ONAQ_points = folium.features.GeoJson(
    ONAQ_plot_locations_json, tooltip='ONAQ Site')
ONAQ_site = folium.features.GeoJson(ONAQ_site_outline_json)

ONAQ_map.add_child(ONAQ_site).add_child(ONAQ_points)
ONAQ_map
Out[5]:
In [6]:
# Create aoi list overlapping CHM tiles with insitu plots
# using function
CPER_AOI_tifs = tiles_over_insitu_plots(
    tiles=CPER_tif_files, plots=CPER_insitu_gdf)
ONAQ_AOI_tifs = tiles_over_insitu_plots(
    tiles=ONAQ_tif_files, plots=ONAQ_insitu_gdf)

# Open only the CHM tiles in aoi
with rio.open(CPER_AOI_tifs[0]) as CPER_src:
    CPER_CHM = CPER_src.read(1, masked=True)
    CPER_extent = plotting_extent(CPER_src)
with rio.open(ONAQ_AOI_tifs[0]) as ONAQ_src:
    ONAQ_CHM = ONAQ_src.read(1, masked=True)
    ONAQ_extent = plotting_extent(ONAQ_src)

Unexpected results, but all is not lost

Turns out that our focal species, Artemsia tridentata, also known as Great Basin Sagebrush or Big Sagebrush, is actually not that big. In addition to not-so-big Big Sagebrush, these ecosystems are often characterized by grasses and other short vegetation. Unfortunately, this vegetation is too short to be preserved in NEON CHMs since NEON discards all measurements less than 2 meters.

In [7]:
# Visualizing height distribution at focal sites
fig, axes = plt.subplots(1, 2, figsize=(10, 5))
fig.suptitle('Histograms of Insitu Height Frequency',
             x=0.5, y=1.0, fontsize=14)

CPER_insitu_gdf['height'].plot.hist(
    ax=axes[0], color=['palegoldenrod'],
    title=('CPER May 2017'))

ONAQ_insitu_gdf['height'].plot.hist(
    ax=axes[1], color=['burlywood'],
    title='ONAQ June 2017')

for i in axes:
    i.set_xlabel('Stem Heights (m)')
    i.set_yticks([100, 200, 300, 400])

axes[0].text(0.5, -0.165, 'Data Source: NEON',
             ha='center', transform=axes[0].transAxes,
             fontsize=10)
axes[1].text(1.71, -0.165, 'Data Source: NEON',
             ha='center', transform=axes[0].transAxes,
             fontsize=10)
plt.show()
In [8]:
# Visualizing CHM aoi's from the CPER and ONAQ sites
#####ADD MARKDOWN FOR THESE #####
### UNITS ###
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(12, 12))
fig.suptitle(
    'Plots of Canopy Height Model Rasters Over Sagebrush Ecosystems',
    x=0.5, y=0.72, fontsize=14)
ep.plot_bands(CPER_CHM, cmap='BrBG', ax=ax1,
              title='CHM of CPER May 2017',
              scale=False)
ax1.text(0.5, -0.075, 'Data Source: NEON',
         ha='center', transform=ax1.transAxes, fontsize=10)
ep.plot_bands(ONAQ_CHM, cmap='BrBG', ax=ax2,
              title='CHM of ONAQ June 2017',
              scale=False)
ax2.text(0.5, -0.075, 'Data Source: NEON',
         ha='center', transform=ax2.transAxes, fontsize=10)
plt.show()

CPER and ONAQ Results

As you can see, the maximum height of all stems in either CPER or ONAQ is less than 2m, as measured by hand. This is visualized in the CHM raster plots which show no gradation within the 0-2 meter range. Therefore, all of our lidar data would show zero similarity to the insitu measurement results (all lidar stem heights would be 'None', reflecting all stems <2m being dropped). However, with our reproducible code, we are able to check to see if other sites are able to produce desired results, validating our workflow.

Testing our approach in the Great Smokey Mountains (GRSM)

NEON's Great Smokey Mountains site (GRSM) in Tennessee (seen in the Figure with all NEON field sites map) is a deciduous and evergreen forest with a wide variety of stem heights. We used the same preprocessing and statistics approach to compare the insitu plots and CHM rasters. The heterogeneous vegetation will result in CHM data >2m, thereby allowing us to explore the accuracy of the 1m resolution CHM data with insitu measured data more clearly.

  • Ecosystem Structure data were collected October 2017
  • Woody Vegetation Structure were collected October 2017
In [9]:
# Get raster data and source crs
GRSM_tif_files = open_ecosystem_structure(
    site='GRSM', date='2017-10')

with rio.open(GRSM_tif_files[0]) as GRSM_src:
    GRSM_crs = GRSM_src.crs

# Get insitu data and create geodataframe for insitu
GRSM_insitu_df, GRSM_plots = open_woody_veg_structure(
    site='GRSM', date='2017-10')

# Create plot geodataframe with plot data only for rs.zonal
GRSM_plot_gdf = buffer_point_plots(
    df=GRSM_plots, crs=GRSM_crs, buffer=40)

#Create plot geodataframe with height data for histogram
GRSM_insitu_gdf = buffer_point_plots(
    df=GRSM_insitu_df, crs=GRSM_crs, buffer=40)

GRSM_AOI_tifs = tiles_over_insitu_plots(
    tiles=GRSM_tif_files, plots=GRSM_plot_gdf)

with rio.open(GRSM_AOI_tifs[0]) as GRSM_src:
    GRSM_CHM = GRSM_src.read(1, masked=True)
    GRSM_extent = plotting_extent(GRSM_src)
In [10]:
# ADD PLOTS
fig, axes = plt.subplots(1, 2, figsize=(12, 7))
fig.suptitle('Exploring GRSM CHM and Stem Heights')

ep.plot_bands(GRSM_CHM, cmap='BrBG', ax=axes[0],
              title='CHM of GRSM May, 2017',
              scale=False)

GRSM_insitu_gdf['height'].plot.hist(
    ax=axes[1], color=['darkseagreen'],
    title='Distribution of Stem Hieghts (m)\n'
          'at GRSM Oct, 2017')

axes[1].set_xlabel('Stem Heights (m)')
axes[1].set_yticks([25, 50, 75, 100, 125])

axes[0].text(0.155, -0.1, 'Data Source: NEON',
             ha='center', transform=axes[0].transAxes,
             fontsize=10)
plt.show()
In [11]:
# Use function including raster stats to calculate CHM statistics
GRSM_tree_heights_gdf = calculate_rasterstats_dataframe(
    tiles=GRSM_AOI_tifs, plot_polygons=GRSM_plot_gdf)

# For this site, only one lidar tile overlaps
# This extracts the focal tile
GRSM_tree_heights_gdf = GRSM_tree_heights_gdf[0]

# Create summary statistics for insitu measurements
GRSM_insitu_stats = GRSM_insitu_gdf.groupby("plotID").agg([
    "mean", "median", "max", "min"])["height"]

# Rename max/mean columns to include 'insitu'
GRSM_insitu_stats.rename(columns={"mean": "insitu_mean",
                                  "median": "insitu_median",
                                  "max": "insitu_max",
                                  "min": "insitu_min"},
                         inplace=True)

# # Merge insitu mean/max df with the lidar df
GRSM_CHM_insitu = GRSM_tree_heights_gdf.merge(
    GRSM_insitu_stats,
    left_on="plotID",
    right_on="plotID")
In [12]:
# Plot 1 & 2 - GRSM max and mean
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(12, 7))
fig.suptitle('Great Smokey Mountains, TN, October 2017'
             '\n Calculated Lidar vs. Insitu Stem Heights',
             fontsize=14)

# Add diagonal 1:1 and regression lines
ax1.plot((0, 1), (0, 1), transform=ax1.transAxes, ls='--', c='k')
sns.regplot(GRSM_CHM_insitu.lidar_max,
            GRSM_CHM_insitu.insitu_max,
            ax=ax1)

ax2.plot((0, 1), (0, 1), transform=ax2.transAxes, ls='--', c='k')
sns.regplot(GRSM_CHM_insitu.lidar_median,
            GRSM_CHM_insitu.insitu_median, ax=ax2)

# Set title and labels for axes
ax1.set(xlabel="Max Lidar Stem Height (m)",
        ylabel="Max Insitu Stem Height (m)",
        xlim=(0, 50), ylim=(0, 50),
        title='GRSM Max Height Comparison')

ax2.set(xlabel="Median Lidar Stem Height (m)",
        ylabel="Median Insitu Stem Height (m)",
        xlim=(0, 50), ylim=(0, 50),
        title='GRSM Median Height Comparison')

ax1.text(0.155, -0.15, 'Data Source: NEON',
         ha='center', transform=ax1.transAxes, fontsize=10)

plt.show()

GRSM Results

The comparisons between GRSM maximum and median stem heights using insitu and lidar measurements are mixed and limited. It appears that there may be a strong relationship between the maximum insitu and lidar height measurements, but a poor association between the medians. The 4 plots that had 100% overlapping lidar tiles, with stems greater than 2m, are not strong enough results to assess the 1m resolution lidar data at this stage. However, further exploration is needed to judge lidar's usefulness in machine learning analysis.

While our present results are currently unable to contribute to our goals of assessing lidar data's accuracy of modeling sagebrush vegetation, we do know that we have a workflow for assessing CHM data in the future and that lidar has potential to contribute to assessing habitats in larger scales. There is a lot of work left to do.

Next Steps

Our pilot project produced code that will work, once we find/create appropriate CHM data. However, our goals are to analyze the accuracy of a range of remote sensing products at different resolutions that can best describe sage grouse habitat at larger scales. Looking forward, minimally we would like to continue to explore CPER and ONAQ sites by:

  1. Creating our own CHM data using raw, discrete point clouds (including vegetation <2m height)
  2. Comparing other products for accurate descriptions of vegetation structure, such as multispectral and hyperspectral data at varying resolutions
  3. Test machine learning to explore our ability to describe larger landscapes of sage grouse habitat range

Ideally, our next steps will also include incorporating resource or habitat modeling for sage grouse and/or using our code to reproduce results at other sites outside of NEON (e.g. Western Wyoming), but this will largely depend on time.

Although our current report was not able to derive the results we needed to assess lidar data's accuracy modeling sagebrush habitat, it does provide us with a workflow that will carry us forward. After we create our own CHM data, we will not only be able to assess lidar's ability to measure stems that are <2m in height, we will also be able to continue our quest in assessing other remote sensing instruments ability to characterize vegetation structure and composition in these environments. The overall project will create a workflow that helps wildlife and land managers prioritize sage grouse lekking sites, therefore many plants and animal habitats, with data that cannot possibly all be measured in person.